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Abstract 

We consider the problem of detecting a 'bump' in the intensity of a Poisson process or 
in a density. We analyze two types of likelihood ratio based statistics which allow for exact 
finite sample inference and asymptotically optimal detection: The maximum of the penalized 
square root of log likelihood ratios ('penalized scan') evaluated over a certain sparse set of 
intervals, and a certain average of log likelihood ratios ('condensed average likelihood ratio'). 
We show that penalizing the square root of the log likelihood ratio - rather than the log likeli- 
hood ratio itself - leads to a simple penalty term that yields optimal power. The thus derived 
penalty may prove useful for other problems that involve a Brownian bridge in the limit. The 
second key tool is an approximating set of intervals that is rich enough to allow for optimal 
detection but which is also sparse enough to allow justifying the validity of the penalization 
scheme simply via the union bound. This results in a considerable simplification in the theo- 
retical treatment compared to the usual approach for this type of penalization technique, which 
requires establishing an exponential inequality for the variation of the test statistic. Another 
advantage of using the sparse approximating set is that it allows fast computation in nearly 
linear time. 

We present a simulation study that illustrates the superior performance of the penalized 
scan and of the condensed average likelihood ratio compared to the standard scan statistic. 
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1 Introduction and overview of results 



The paper is concerned with the following problem: One observes and inhomogeneous Poisson 
process X±,..., X^ on the real line with intensity 



\{x) 



PH(x), x £ I 
qn{x), x g I 



where (j,(x) > is a known function with f \i < oo, but p, q > and the interval I are unknown. 
Hence the intensity is known up to a multiplicative factor and we want to test whether this factor 
is elevated on some interval /: 

Hq : p = q, Ha ■ p > q for some interval I. 

This setting arises in a number of applications involving the detection of a 'cluster', see e.g. Glaz 
and Balakrishnan (1999), Loader (1991) and Kulldorff (1997). The latter two references also 
give extensions to the bivariate case, which is relevant for detecting spatial disease clusters while 
adjusting for the known population density \i. Since under Hq the nuisance parameter p = q 
is unknown, we follow Loader (1991) and analyze the problem conditional on N = n. Then 
Xi, . . . , X n are i.i.d. with density 

n\ t ( \ rl(x e I) + l(x e I c ) fi(x) p 

(1) tri(x) = , , j- . fo(x), where fo(x) := — and r := -, 

1 rF (I) + F (I C ) V 7 V ' q 

and the testing problem becomes Hq : r = 1 vs. Ha '■ r > 1, so we test whether the observations 
come from a known density /o (which we may assume w.l.o.g. to be the uniform density, see 
(O) vs. the case where /o is elevated by a multiplicative factor over some interval /. Thus the 
methodology introduced in this paper may also be applied for certain 'bump-hunting' problems, 
see e.g. Good and Gaskins (1980), Hartigan (1985), Miiller and Sawitzki (1991), Minnotte and 
Scott (1993) or Polonik (1995). 

Loader (1991) and Kulldorff (1997) address the above problem with the scan statistic, i.e. the 
maximum of the log likelihood ratio statistic for varying /. Chan and Walther (2011) investigate a 
related problem in the abstract Gaussian White Noise model. They show that the scan is generally 
suboptimal for this type of detection problem, but that optimal detection is possible by averaging 
likelihood ratios over a judiciously chosen collection of intervals. They also suggest that optimality 
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can be restored for the scan either by modifying it with a penalty term that was introduced by 
Diimbgen and Spokoiny (2001) for kernel statistics in a different context, or by using the blocked 
scan introduced by Walther (2010) and Rufibach and Walther (2010). 

Here we show how optimal detection can be achieved in the practically important case of 
intensities and densities with likelihood ratios as the principal tool for inference. The main prob- 
lem in trying to adapt the penalization technique from the abstract Gaussian White Noise model 
is that the form of the panalty term depends partly on the specifics of an exponential inequality 
that needs to be established for the variation of the local test statistic. This inequality has to be 
established anew in each setting, and this is a quite difficult theoretical exercise, see Section I(x2l 
Walther (2010) and Rufibach and Walther (2010) circumvent this problem by penalizing p- values 
rather than critical values, but at the cost of a more complex methodology and more computation. 

One of the main contributions of this paper is to show how the conceptually simpler penaliza- 
tion of critical values can be implemented in the important case of log likelihood ratios, without 
having to establish an exponential inequality for its variation. Our main tool is to consider an 
appropriate subcollection of the collection of all intervals. It is possible to construct such an ap- 
proximating set of intervals that on the one hand is rich enough to allow optimal detection and on 
the other hand is sparse enough to allow justifying the validity of the penalization scheme simply 
with the union bound. This approach was used in Walther (2010) in the multivariate Bernoulli 
model to penalize p-values when scanning with rectangles. Our key idea to make this approach 
work for penalizing critical values is to penalize the square root of twice the log likelihood ratio 
instead of the log likelihood ratio. This transformation results in a penalty that yields optimal 
detection. And due to the use of a sparse approximating set of intervals, the appropriate penalty 
term can be read off from the tail bound of the log likelihood ratio itself, which in this case is sim- 
ply given by Hoeffding's inequality. As will become clear from the exposition, this methodology 
should also be applicable in a wide range of other contexts, such as those cited in this section. 

We end up with a new penalty that is somewhat different from the one used in Diimbgen and 
Spokoiny (2001). The form of this new penalty derives from a different limiting process (Brownian 
bridge instead of Brownian motion) and simulations show that it results in a superior finite sample 
performance when compared to the Diimbgen-Spokoiny penalty. 

In the second part of the paper we show that averaging the likelihood ratios over a particu- 
lar approximating set of intervals (the condensed average likelihood ratio (ALR)) also results in 
optimal detection. We note that the construction of an appropriate approximating set of intervals 
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plays a crucial role for both methodologies, both in terms of statistical inference and for efficient 
computation: For the condensed ALR, the appropriate construction of an approximating set di- 
rectly results in optimal detection, while for the penalized scan it justifies the use of the particular 
penalty term. In both cases it results in efficient algorithms that run in almost linear time versus 
the quadratic algorithms required for evaluating all intervals. This computational aspect may well 
be the dominant concern for some users. 

In Section [5] we provide a simulation study that shows that the penalized scan and the con- 
densed ALR are clearly superior to the scan, with the condensed ALR having the overall best 
performance. 



2 The scan statistic and the penalized scan 

We will work in the density setting dTJ, i.e. conditional on N = n. The main advantage of such 
a conditional analysis is that it eliminates the nuisance parameter p under the null hypothesis, and 
hence this approach avoids the problematic performance of likelihood ratio tests when a parameter 
is misspecified. Another advantage of the conditional analysis is that it allows for exact finite 
sample inference as will be seen below. Finally, we note that the conditional analysis does not 
require the underlying point process to be a Poisson process, but it is also valid for certain other 
processes that are not Poisson processes or that do not even have independent increments. 

A standard computation shows that for a given interval / the log likelihood ratio test statistic 
for testing Hq : r = 1 vs. Ha '■ r > 1 in (Q} is given by 



logLR n (F (I),F n (I)) := { 



( 

n 



F n (/)log(f$) +n(l - F„(/))log(£Jj$}) if Fn(I) > F (I) 
else, 



where F n denotes the empirical cdf. Since / is unknown, it is customary to assess the evidence 
against Hq with the scan statistic (maximum likelihood ratio statistic) 

M n := sup logLR n (F (I),F n (I)) 
intervals IcR 

(2) = max logLRn I F ( , X< k) } ) , ) 

where the equality follows from elementary considerations. Kulldorff (1997) gives a derivation 
of the maximum likelihood ratio without conditioning on N that results in the same formula for 



M n . As observed experimentally by Neill (2009a) and Chan (2009) and explained theoretically 
by Chan and Walther (201 1) in an abstract Gaussian regression setting, the scan will generally be 
suboptimal for detection. One way to rectify the situation is by adding a penalty term as introduced 
by Diimbgen and Spokoiny (2001) for kernel estimates. We propose to use the following form for 
a penalized scan: 



where the data-dependent collection of intervals J~ a pp is defined below. For some applications 
it may be more appropriate to use a collection of intervals that is not data-dependent, see e.g. 
Neill (2009b). We therefore also analyze the variant 
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P n ° := max ( J 2logLR n (F (I), F n {!)) - /21og 



Je^/V V J y F n (/)(l-min(F n (/),i)) ; 

where J~ G app is defined below. Penalizing the square root of logLR n instead of logLR n is crucial 
if one wants to use a simple, additive penalty term that yields optimal detection: Calculations 
show that an analogously derived penalty term for logLR n will not result in optimal detection, 
unless one is willing to work with an intricate non-additive penalty. The above penalty is dif- 
ferent from what one would expect from the work in the abstract Gaussian settings in Diimbgen 
and Spokoiny (2001) and Chan and Walther (2011). That work suggests to penalize the statistic 



pertaining to the interval I with a/2 log e/F n (I). However, it will be seen in Section I(x2l that the 
relevant limiting process of y/logLR n does not involve the increments of Brownian motion but 
those of the Brownian bridge. While a theoretical analysis shows that one can still employ the 
\J 2 log e/F n (I) penalty for the latter case (provided that F n (I) stays bounded away from 1), it 
also shows that there is some flexibility in designing the penalty. In fact, the theoretical analysis 
in Section [6721 as well as simulations show that for a Brownian bridge it is much preferable to use 



the penalty 2 log - , T ., 6 /T , 1 „ , and this is essentially the penalty we used for P„ since 

F J V B F„(/)(l-min(F„(7),|))' n 

we always have F n (I) < | there. 

As approximating set J ' app we use the univariate version of the approximating set introduced 
in Walther (2010): 
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3~ app = 3 app{£), where £ % 



n 

log 2 



and 



log n. 

Jappit) = |[X (i) ,X( fc )] : j, G {1 + idi,i = 0, 1,. . .} and m e < k- j < 2mg}, 
where mg = n2~ e , dg 



mi 
6VI 



J: 



3 app is defined_J analogously with the endpoints of the intervals given by the corresponding 
quantiles of Fq rather than those of F n , i.e. we use [F _1 (^), F _1 (^)] in place of [Xu-\, X^]. 
A simple counting argument shows that #J~app(£) < 36 £2 , hence both J ' app and & app have 
a cardinality that is bounded by Yle=T 36 £2^ = 0(n). Thus both P n and P° can be computed 
in 0(n log n) steps, where the complexity is dominated by sorting the data. This advantage of 
efficient computation plays an important in many applications. 

By definition J~ a pp{(-) contains intervals whose empirical measure is roughly the same, up to a 
factor of two. The 'largest' intervals at I = 2 have empirical measure up to |; there is no practical 
interest in considering larger intervals, and this upper bound can be changed as appropriate. The 
'smallest' intervals at t = £ ma x have empirical measure of about log n/n since in a density setting 
it is not possible to obtain consistent inference with fewer observations. The key parameter of 
the approximating set is dg, which describes how finely the endpoints are spaced as a function of 
the length of the interval: Small intervals require a fine spacing for a good approximation, while 
for large intervals a coarser spacing is sufficient. The particular formula given by dg ensures that 
intervals of all sizes are approximated sufficiently well to guarantee optimal detection, as shown 
in Theorem [2j while at the same time the approximating set is sparse enough that one can control 
P n simply with the union bound (this property does not hold e.g. for the approximating set given 
in Rufibach and Walther (2010)): 

Proposition 1. Both P n and are O p (l) under Hq. 

Before proving Proposition [T] we note that the second key ingredient besides the sparse ap- 



proximating set is the 'standardization' of F n {I) in terms of the transformation yj 2logLR n (Fq (I),F n (I)) 
instead of the usual way to standardize a binomial random variable. The latter case results in 
one tail that is not subgaussian and which is heavier than the other tail, see Shorack and Well- 
ner (1986,Ch.ll.l), a problematic fact for the multiple testing set-up considered here. In contrast, 



log 2 and log denote the logarithm with base 2 and e, respectively. 
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the 'standardization' via the above likelihood ratio transformation leads to clean subgaussian tails: 
For a fixed interval I and t > 



(3) JP ^2logLR n (F (I),F n (I)) > t) < exp(-^ 2 

While we could not find a statement of this result in the literature, it is implicit in the proof of 
the Chernoff-Hoeffding theorem, see Hoeffding (1963): That proof establishes P(F n (7) > v) < 
exp(-logLR n (F Q (I),v)) for v £ (F (7), 1], see A.6.1 in van der Vaart and Wellner (1996). Since 
it is easily seen that the function v — > logLR n (Fo(I),v) is strictly increasing for v > Fq(I), we 
obtain P (logLR n (F Q (I), F n (I)) > t) < exp(-t), and ® follows. We note that © also holds 
for the two-sided version of the likelihood ratio provided one adds the factor 2 on the right side. 
Since #J app (i) < 36*2* we obtain for k > 2: 



P max U2logLR n (F {I),F n {I)) - /21og )>k 

V/e ^ V V F (/)(l-min(F n (/),±) 



'-max 1 / / \ 2 

< g # ^ Mre ^ m exp(--(^21o g ^ 7J + re 

^■max 

< ^2 36 ^ expf-nVI - k 2 /2^J since F (J) < 2 x 2~ 

1=2 

< Cexp(-K 2 /2) 



for some universal C > 0, proving Proposition Q] for P°, but with Fq(I) instead of F n (I) in the 
penalty term. Using this result and © one readily finds uniform bounds on the ratios F n (I) / Fq (I) 
which allow to replace Fq by F n in the penalty term. 

The proof of P n = O p (l) is analogous, the main difference being that the intervals I are now 
random. Since by construction all intervals I G J'app have empirical measure at least logn/ra, 
Lemma [2] shows that the tails of \J2logLR n are close enough to subgaussian that the above argu- 
ment goes through, concluding the proof of Proposition Q] 

Finally we will also consider the direct penalization of the scan Q, i.e. without approximating 
the set of all intervals: 



log n<k—j<n/2 



7 



Our main reason for investigating P£ is that we need the following result for our theoretical 
analysis of the average likelihood ratio in Section [3j 

Theorem 1. Pf = O p (l) under H . 

The restriction k — j > log n is necessary for this result to hold since for very small intervals 
the tail of the test statistic is far from subgaussian, causing the null distribution to blow up, see 
Lemma [2] Of course, those small intervals are not required for optimal detection, and JT ap p does 
not employ them either. 

The proof of Theorem [T]is neither short nor straightforward, using the Hungarian construction. 
In contrast, the short proof of Proposition [T] given above, is essentially an application of Boole's 
inequality together with a simple counting argument. This is one of the two main advantages of 
using the approximating set J~ a pp, the other being the computational complexity of 0(n log n), 
whereas P% u requires to loop over 0(n 2 ) intervals. 

Note that all versions of the scan introduced in this section are distribution free and thus allow 
exact finite sample inference, see Section [5] 



3 The condensed average likelihood ratio 

Chan and Walther (201 1) introduce the condensed average likelihood ratio in a regression setting. 
Here we investigate its performance in a density context. Define 

AT d ■= -r^- £ LR n (F (I),F n (I)), 
W-Lapp T - 

which is the average of the likelihood ratios LR n = exjp(logLR n ) over the approximating set of 
intervals 



n 



log n 



and 



Z apP = |J Zapp(£), where i max = log 2 
Iap P (£) = |(X (j ),X( fc) ] :j,ke {l + id £ ,i = 0, 1,...} and m e <k-j< 2m £ |, 

o-i a r ^i 4/5 ~ 

where mi = n2 , di = — 

log n 



Note that X app differs from J'app used above for the scan in the choice of dg. The different 



8 



choice of di is necessary to guarantee optimal detection, but it still allows computation in almost 
linear time since it is readily checked that #X app = 0(n log 2 n). A second difference is that I app 
uses half-open intervals (X^\,Xt k \] rather than closed intervals with a corresponding empirical 
measure instead of fc ~^ +1 . These changes guarantee that A c ° nd will stay bounded under Hq: 

Proposition 2. A c n ond = O p (l) under H . 

The density setting investigated here requires a proof that is more involved than the one in the 
regression setting considered in Chan and Walther (2011). Further, in the density setting there is 
no need to consider small intervals with empirical measure less than about logre/n, and X app is 
defined accordingly. 

jscond j s a j SQ distribution free and thus allows exact finite sample inference. 



4 Optimality 

Next we investigate whether the penalized scans P n and P® and the condensed average likelihood 
ratio A c ° nd allow optimal detection, i.e. whether they are able to detect alternatives (Q]) that satisfy 

( A^ ErAl)-Fo{I) f- 



with e ny2 log p j^Y) ~* 00 ■ Note that both r and / may depend on n, but for simplicity we will 
not include this in our notation. Using arguments as in Diimbgen and Spokoiny (2001) and in 
Walther (2010), one can show that no procedure can reliably detect alternatives F r j that satisfy 
(01) when (1 + e n ) is replaced by (1 — e n ). Thus © does indeed describe a condition for optimal 
detection. We note that while in the regression context the 'scale' of the effect is given by the 
spatial extent |/|, in the density context this role is played by the probability F r j(I). 

Theorem 2. The penalized scans P n and P® 11 and the condensed average likelihood ratio 
provide optimal detection, i.e. they have asymptotic power one uniformly in signals satisfying (0). 
This result also holds for P° provided F (I) > 2" lmax . 

Thus the optimality of P® comes with a proviso due to the fact that the approximating set 
j\ vv is built from the null model and not from the observed data: If the interval / supporting the 
bump is very small, then the approximating set 3"° app is not fine enough to allow optimal detection. 
While this can be remedied by increasing £ ma x, such a step will severely affect the computational 
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complexity, and it is not clear a priori what an appropriate choice for £ ma x would be. P n and 
A c ° nd avoid this problem by using data-dependent approximating sets. One of the consequences 
of Theorem|2]is that these approximating sets are rich enough for optimal detection and there is no 
need to look over all intervals as in P~ l ■ This has obvious computational advantages as discussed 
above, and it allows for a much simplified theoretical analysis: compare the proofs of PropositionQ] 
and Theorem [TJ An interesting distinction between the scan and the average likelihood ratio is the 
fact that the approximating set will automatically lead to optimal detection for the latter, but not 
for the former: Evaluating the unpenalized scan M n on JTapp or on the approximating sets given 
in Neill and Moore (2004) or Arias-Castro et al. (2005) will result in optimal detection only on 
the smallest scales, i.e. for F r j(I) « Optimal detection on all scales seems to require the 

use scale-dependent critical values, such as via a penalty term as in P n or via the blocked scan 
introduced in Rufibach and Walther (2010) and Walther (2010). 



5 A simulation study 

We illustrate the theoretical results given above with a simulation study that compares the perfor- 
mance of the scan, the penalized scan P n , and the condensed average likelihood ratio A c ° nd . In 
order to arrive at a fair comparison, we evaluate the scan M n only over intervals that contain be- 
tween log n and n/2 observations. This increases the power of the scan compared to the original 
definition © and provides the same a priori assumptions about the length of the cluster for all 
three methods. 

Note that since F is known we may assume that F is the C7[0, 1] distribution: Applying the 
transformation Y = F (X) transforms the model ([Tj into 

fK\ t n rl{y e I) + l(y e I c ) . . 

(5) frAv) = r | J | - 1 _ jjj — 1(2/ G [0,1]), 

where the interval / is the image of the original interval I under the map Fq. Moreover, all the 
statistics M n , P n ,P®, P% 11 , and A^ nd are seen to be distribution free. Hence we may simulate the 
null distributions of these statistics by drawing X\,... ,X n i.i.d. U[0,1] (say), thus allowing for 
exact (up to Monte Carlo simulation error) finite sample inference. 

Tables [TJ and |2] list the power at the 5% significance level for sample sizes n = 10 4 and 
n = 10 6 , respectively. Each case considers the range for the effect ratio r where detection starts 
to become possible, for a small interval and for a large interval /. These simulations illustrate 
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how the optimality result of Section 0] about P n and A c ° nd sets in. In contrast, one sees that the 
scan M n is competitive only for signals on the smallest scales and it is inferior to P n and A c ° nd 
otherwise. In the context of regression, the inferiority of the scan at larger scales was expounded 
theoretically by Chan and Walther (201 1). Note that unlike in the regression context, 'scale' is not 
given by the length |/| but by F r j(I), which is of the order rFo(I) as long as the latter quantity 
stays bounded. 

The simulations show that the condensed average likelihood ratio A c ° nd has arguably the best 
overall performance among the three procedures considered. 
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Table 1 : Power in percent for detecting clusters £0) for various values of r and two different lengths 
of / for sample size n = 10 4 . 
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Table 2: Power in percent for detecting clusters (Q} for various values of r and two different lengths 
of / for sample size n = 10 6 . 

In the above simulations the finite sample exact critical values and the power were approxi- 
mated with 10 5 and 10 3 simulations, respectively. The location of the interval / was randomized 
in each simulation to avoid confounding the results with the particular construction of the approxi- 
mating sets I ap p and J ' app . In the case of the sample size n = 10 6 , the scan M n was evaluated on 
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the approximation set J~ a pp, i.e. the same approximation set used for P n , since looking at all inter- 
vals was computationally prohibitive. Conversely, for sample size n = 10 4 we examined the effect 
of the approximating set by running the simulation with the penalized scan and the condensed av- 
erage likelihood ratio evaluated over all intervals containing between logn and n/2 observations 
rather than evaluating them over an approximating set. We observed only very small changes in 
power, mostly decreases, and the computation time was much longer. This confirms the theoretical 
finding from Section H]that it suffices the evaluate these statistics over an approximating set, which 
yields tremendous computational advantages without sacrificing detection power. 



6 Proofs 

6.1 Preliminary results 

1. Using log x < x — 1 and a two term Taylor expansion, respectively, gives for a, b € (0, 1): 

n ^~ a) r > logLR n {a,b) = " (b - a) 2 l(a < b) for£e(a,&) 

(6) a(l-a) 2£(l-£) 

>^(6-a) 2 l(a<6) 

2. Let I be an interval satisfying I := [log 2 1/ F n (I)\ + 1 < £ m ax, so ni£ < nF n {I) < 2m^. 
Then by construction of JT ap p(£) there exists I £ J'appiC-) such that 

(7) F n (IAI) < 2^1 < 

and the same result holds for j\ vv with F n replaced by Fo in the above. 

Lemma 1. Let J be an interval and F r j be the distribution given in ([7]) with r > 1. Then for 
G G {F , F r j}: 

(F r ,r - Fo)(J) > (F rJ - F )(J)(l - ^p-) ifG(I) < \, and 

-i Fpjl \ J) F rJ (J) F (J\I) 
Fo(I) ~ F rJ (I) ~ ^ F (I) ' 
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For a proof of Lemma Q] note that 



fr,l( x ) ~ fo( x ) 



d r ,ifa(x) if are J 

-7^ d r,lfo( x ) ^ x 



e l c 



where d r j := r/(rFo(I) + Fq(I c )) — 1 > since r > 1. Hence 



(F rJ -F )(J) = d r jF (InJ) 
= (F r>I -F )(I) 



iW) <lFo{JXI) 

F (inJ) F (J\I) 



Fo(I) 



F (F 



and the claim for G = F follows by writing F (I f] J) = F (I) - F (I\J). The claim for G = 
F r j follows since F °p^^ — Zp^WY = ^Fjlj)^ ~ "FTTT^T by tne definition of f r j. The second 
claim follows from dividing the inequality F r j(I) — F r j(I\J) < F r>I (J) < F r j(I)+F r j((J\I) 



by F r i (I) and observing 



Fr,l(I\J) _ F (I\J) 



Fr,l(I) 



Foil) 



and = by the definition of f rJ . 



r F (I) 



□ 



The following lemma is an extension of Proposition 2.1 in Diimbgen (1998): 

Lemma 2. Denote the two-sided log likelihood ratio statistic by logLR t ™°(a, b) := nblog - + 
n(l — b) log and the one sided versions by logLRn^ t (a, b) := logLR t r ^°(a, b)l(a < b) and 
by logLR r n i9ht (a,b) := logLR!^ {a,b)l (a > b). Hence logLR 1 ^ 1 equals logLR n used above. 
Let U\, . . . ,U n be Ltd. U[0, 1], so — ~beta(k — j, n + 1 — k + j)for 1 < j < k < n. 
Setpjk := ^rf. Then for p G (0, 1) andt > 0: 



IPf J2logLR%>°(U (k) - U U) ,p) >t)< 2exp<j -minf^, <>i±i) ^ + „ 



P - Pjk 



< < 



2exp(-f 



p ' 1-p ) n 2 ' '" l(p > Pjk) - Pjk ■ 
ifp := p jk 



2exp 



(k-j) t 2 



+ 3 5TP := 



fc-j+1 



< 



In more detail: 



w(logLR l :f\U {k) - U (j) ,p) > tj < exp{ 



IP (logLR^ ht (U ik) - U u) ,p) > tj < expj-^ 



p jk (n + 1) 



P 



ii 



t — n 



(p - Pjk)(p - PjkHPjk > p)) 



)} 



1 - p jk ) (n + 1) 



P) 



n 



t — n 



Pjk{^-Pjk) 
(Pjk ~ £>)[! ~ P ~ {l-pjk)l{Pjk <P)} 
PjkO- - Pjk) 



)} 
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Hence in the case of random intervals whose lengths follow the beta distribution, y^2logLR n wo 
has subgaussian tails for p = pj k . For p close to pj k the tails are still subgaussian but with a scale 
factor that is smaller in one tail and larger in the other. 

Proof of Lemma E) For u G (0,p): 

logLR l ^\u,p) = logLR%>°(u,p) 

= ^logLR^°(u,p jk ) + logLR%»°(p jk ,p) + log If** 

Pjk Pjk 1 - u 

< JLi ogL Rt™ { k) + n iPjk -P? + i g(i _ p fc ) i(p . fc < p) by © 

Pjfc PjkO--Pjk) Pjk 

. P , rn le/t/ \ , {P ~ Pjk)(p ~ PjkHPjk > p)) 

< — logLRJ (u,p jk ) + n — ^ 

Pjk PjkK^-Pjk) 

since —(1 — pj k ) log(l — Pj k ) < Pjjt, an d because logLR 1 ^ 1 (u,p) is non-increasing in u while 
logLR t n wo (u,pj k ) is increasing for it > p^. Hence the inequality above must also hold with 

logLR* DO (u,p j k) replaced by m.m.[logLR n wo {u,p jk ), logLR t ^ 10 (p jk ,p jk )) = logLRn ft (u,p jk ). 



Now the probability inequality for logLR 1 ^ 1 follows from the above inequality together with 
P {logLR 1 ^ 1 (U(k) — U(j),Pj k ) > t \ < exp^— ^^tj, which is aconsequence of Proposition 2.1 
in Diimbgen (1998). The inequality for the right tail follows analogously, and the tail bound for 
\/2logLR t n wo obtains as a consequence. □ 

6.2 Proof of Theorem [J 

Under H Q , (F (Xi), . . . , F (X n )) = (U u . . . , U n ), where U u ...,U n are i.i.d. U[0,1]. We divide 
the collection of intervals under consideration into a collection of small intervals 

<5n := | (i) k) : 1 < j < k < n, log n < k — j < log 2 n\ 

and the collection of the remaining intervals 

l n := {(j, k); 1 < j < k < n, log 2 n<k-j< n/2}. 

The cardinality of S n is small enough that we can use the union bound to show 



(8) fife. [ V 2 "***^ - g w "-^r 1 ) ~f log (k-Mn- k+i) ) = °> m 
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For the larger intervals we approximate \J2logLR n by the normalized increment of the uniform 
quantile process: 



(9) max 



2logLR n (U {k) -U(j), 



k-j + 1 



n 



fc-i / 1 _ k-j 



n \ n 



o P (i) 



The normalized increments of the uniform quantile process can in turn be approximated on X n 
by the normalized increments of a Brownian bridge B: 



(10) 



max 



k-j 

n 



u (k) - U(j) 



B( 



Bt 



k-j 



k-j 
n 



k-j 



k-j 
n 



o P (i) 



Theorem[T]follows from (I8UT01) together with 



(11) 



sup 



\B(t)-B(s) 



0<s<t<l V y/(t — s)(l - (t - s)) 



2 log 



(t-s)(l-(t-s)) 



< 00 a.s. 



The above results also show how one might design an appropriate penalty if one wishes to scan 
over very large intervals, i.e. (k — j)/n close to 1. Indeed, it is well known that the normalized 
uniform quantile process blows up both at and at 1, see Ch. 16 in Shorack and Wellner (1986). 

Proof of ©: Clearly #S n < nlog 2 n. Hence the tail inequality for yj2logLR n given in 
Lemma 12 yields for C > 0: 



PI max I \l2logLR n (U( k ) -Uu), !- 

f k — j 

< nflog 2 n) max 2e 3 exp < 

- v \mes n p | 2{k-j + i) 



2 log 



en' 



(k- j)(n -k + j) 



> C 



C + W2I0B 



en' 



< 2e 3 n(log 2 n) expj- (l 



1 \/C 2 , en „ /~~ en 
— + log + C. 2 log — j 



logn/ V 2 " log z n 
since (k — j)(n — k + j) < n log 2 n on 5 n 



(k-j)(n -k + j) 

)} 



log w 



< 2e' 



'(log 4 n) exp j- 



1 \ (C 



logn/ V 2 



— + Cj2\o. 



log n 



} 
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Proof of©: By© 



2logLR n [U {k) - U(j), 



k-j + 1 



n 



k-j+1 
n 



k-j I i k-j 



n \ n 



(12) 



k-j + 1 



n 



(U {k) - U (j) ) 



1 



a 

\ n 



for £ between fc and U^) — U^y On the event 



A n (C) :-- 



U(k)-U {j y 



k-j 



ii 



< C+W210G 



en' 



— j)( n ~~ & + j) ' V n 



k ~i( 1 _^JL ] forall 



n 



we have 



fc-j 



< (C+vglo^) ^ < 2 (fc-j) eventually _ Hence 

— Vk—J n ~ vlogn n J 



< 



ck—j 



fc - j 



/I 



+ 



1 



< 



An 



+ 



4n 



k-j 



n 



(k — j)\J\ogn {n — k + j)y/\ogn 
4 



k-j (l_ k-j 
n \ n 



n 



Since < b — a < 6/2 for a, b > implies — \fa\ < (b — a)/y/b, (fl2l is not larger than 



^(i-¥)iog 



< 



4 c+«/2io. 



(k-j)(n-k+j) 



V 



\/log 



■;?. 



+ 



< 4 C + V2bfn + 



Vlog n (logn) 3 / 2 



for (j,k) G X„. 



d9j) follows since linic_ Sl00 lim infn^oo P( v 4 n (C)) = 1 by (TTOb and (fTTT >. and replacing ^— ^ 
with in the numerator of the second term in (© incurs a difference bounded by 8/ log n as 

seen above. 

Proof of ([10): By the Hungarian construction, see Theorem 12.2.2 in Shorack and Well- 
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ner (1986), there exists a version B n of the Brownian bridge on [0, 1] such that 



lim sup max 



^-J- ~ (U( k ) - U (j) ) - B n (^) - B n (^) 



< lim sup max 

n (j,k)eX n 



n U, 



(k) B n (-) + y/n( U {j) 



Bui 1 ) 

n 



log n 

< 2M — — a.s. form some M < oo 



n 



The claim follows since J*=i\l- *=ij > for (j, k) G X n . 

Proof of (flU : This can be proved using Theorem 6.1 in Diimgben and Spokoiny (2001): On 
the set of all intervals T := {(s, t] : < s < t < 1} define the metric p via p 2 f(s, t], (s' , \ := 
\s - s'\ + \t- t'\ and the stochastic process x((s,t]\ := B(t) - B(s). With a 2 (^(s,tfj := 
(t - s)(l - t + s) one readily checks that a 2 ({s,t]J < a 2 ((s',t']J + p 2 ((s,t], {s',t']\. Since 
x(^(s, t]j /cr((s, t] \ ~N(0,1), the subgaussian tail condition (i) of said theorem holds. To prove 
the subgaussian tail condition (ii) for the variation of X, write B(t) = W(t) — tW(l) for a 
Brownian motion W. Then 



X(M) - *(( S ',i'f) _ W{{a,t] \ (s',t']) - W((s',t>) \ M) (t-s)- (f - s') 

W ^ ^/\^7\ + \t - tf\ 



p[(s,t],(s',t r 
Leb 



s'\ + \t- t'\ 



y/\8 - 8>\ + \t - t'\ 
. 2 

-2- 



„, , ^[(s,t]A( S ',t'}) ((t-s)-(t'-s')) Leb((s,t]A(s',t'])((t-s)-(t>-s')) 

N I 0) i n : r, m l~" 



s'l + \t- t'\ 



s'\ + \t- t'\ 



The latter variance is not more than four, hence condition (ii) holds with L = 1 and M = 8. 
Finally, a calculation similar as in Diimgben and Spokoiny (2001) shows that V = 1. dTTT > follows. 

Checking condition (iii), i.e. establishing an exponential inequality for the variation of the pro- 
cess under consideration, is the technically most challenging aspect in applying said Theorem 6.1, 
see also e.g. Diimgben and Walther (2008). Here we approached this problem via the Hungar- 
ian construction, which leads to the simpler task of establishing an exponential inequality for the 
variation of the increments of a Brownian bridge. □ 
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6.3 Proof of Proposition |2] 



d 



We use Fq ( (X^,X^\ J = U^) ~ ^(j) for Ui, . . . ,U n i.i.d. U[0,1] and define the event 



k — j\ en 



2 



B m ,n := [^j2logLR n {U {k) - U (j) , — — J < W 2 log (fc _ ^- _ - - ^ +m for all (j, fc) G X app 
We will show that for (j, k) G X app (£) 

(13) E^Liin (V( fc ) - U(j), ^—^j lBm,») < 14(\/27 + m) eventually, uniformly in 

(j, fc) and £ Then A c ° nd = O p (l) can be shown as in the proof of Theorem 3 in Chan and 
Walther (2011) since linim^oo liminfn-^oo IP(Z3 TO>n ) = 1 by TheoremfTJ which is readily seen to 
hold also with ^ in place of k ~ 3 n +1 in the definition of P^ u . 

To prove (fT3l fix (j, k) G X app {t). We will show below that on the event B m ,n for n > no(m) 

(a) u := C// fe ) — [/(j) falls in an interval of length at most 



1 



C(^—^)+m), where C(8) := \2 log i, 



n \ \ n / / V <5 ' 

and 

(b) - > 

Using the fact that Ui k \ — U(j\ ~ beta(/c — j, n + 1 — A; + j) we can then compute 

ie(lrJu^ -u, l> '~ r 



<LR n {u {k) -U {j) ,—l)l Bm>n 



* " 'V-' ' 1 - ^ B " fc+ V + i - 1 (1 _ nr - fc+ , n! ^ 



B \ nu ' \ 1 — u J (k — j — \)\{n — k + j)\ 

< — / / -. r- du by Stirling's formula 

2tt J b nu \ k-j I k-j 



^ 32 ( c (¥)+'" 

by (a) and (b). £[3]> follows since (j, fe) G X app (£) implies ^ > 2~^. 

(a) follows for n > no(m) from the inequality given in Proposition 2.1 in Dumbgen (1998) 
together with the fact that k — j > logn by the construction of X app . Said inequality yields in 
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particular 



k — j 
u > 



■;?. 



—)',/, j\ \ k-j( C(*=i) + m 
C -)+m) > -( 1 



\ n V V n ) ) n V ^Jk — j 



Thus in the case k — j > 4 log n, (b) follows since y/k — j > | \^\~n) + mj for n > no(m). 
In the case k — j = b log n with b G [1,4), consider u := Then a standard calculation shows 
that logLR n (u, — i ) > ( § + o(l) ) log n, where the o(l) term is uniform in b. Thus this choice 



of u violates the inequality defining B m ^ n for n > no(m). S ince logLR n [u, increases as 
moves away from this implies that we must have u > for n > no(m), completing the 
proof of (b). □ 

6.4 Proof of Theorem H 

We first prove the claim about P°. Consider an alternative (Q]) that satisfies (O and also Fq(I) > 
2-4m». Then £ := [log 2 1 / F (I) \ + 1 < £ ma:E , so by © there exists / G J° app {Z) with 
Fq(IAI) < F °^0 . Set 6 n := e ra /21og F e , n , so 6 n — > oo by assumption Q. On the event 



u 



A n := {-Pn(-f) > ^r,/0O - V FrJ i J)fe " } condition © implies F n (J) > F (I) and hence 



2logLR n (F (I),F n (I)) > ^ l^Lj^l by© 



> V n — — 7==^ V "n on Ai since x — )• — 



> ^./ffl-W/, l y nr byLemmam 

> f./21og— ^— + 6 n )(l 2 ) - 



^/CO A 3, /tog 



> ,/21og — + -5 n - 1 - v 7 ^ 
- 1 S 3F n (/) 3 V 



where the last inequality hold by Lemma[T]and on the event B n := ^F r j(I) < 2F n (I)\. Cheby- 
shev's inequality gives W(A n ) > 1 — ^ and IP(£>n) > 1 — nF 4 > 1 — where the last 
inequality follows with Lemma[T]from F T j{I) > 2 log n/n, which in turn is a consequence of (0]). 
Hence P® — > oo with probability converging to 1, uniformly in alternatives satisfying ©. On the 
other hand, the critical value of P® stays bounded by Proposition [T] 
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To prove the claim for P n note that by © we can find I G J~ a pp(£) such that F n (IAI) < 

F (I) 

™\j by taking £ := |_log2 l/-^n(-0J + 1- This index satisfies £ < £ max - It is readily seen 
that © implies F rJ (I) > 2 ^ n +^V^gn for n large enoug h, hence W(\F n {I) - F rJ (I)\ < 
| 2 ] ° g 11 — F r j(I)\) > 1 — by Chebyshev. This implies that with probability converging to 1 
we can now guarantee firstly that F n (I) > 21 ° g " and hence £ < £ max , and secondly, F n (I) < 
2F rJ (I), hence F n (JAJ) < ^jf-. Note that I is a random interval since J~ app is constructed 
w.r.t. F n . Hence the above proof for fixed / does not go through any more, but the claim can be 
established as in the proof for A™ nd below. There we consider I C I, which can be enforced 
above while still guaranteeing £ < £ ma x- Alternatively, (PT5T ) can be readily extended to cover 
the case I </L I. The approximating set X app used for A c ° nd differs from JTapp used for P n in 
the spacing parameter dg, but that is not relevant for the part of the proof below that establishes 



2logLR n (F (I),F n (I)) > J2 log j^jj + B n . 
To prove the claim for A c ° nd we consider the collection of all intervals in the approximating 
set whose endpoints are close to those of /: 

A(I) := {/ G X app (£) : I C / and F n (I) > F n (I)(l - r/ n /2)} 



where r? n := min(l, — ?=^r^ J and £ := Llog 2 Fn(f)(1 _^ /4) J + 1- Hence m t < nF n (I)(l - 
%/4) < 2rri£. As above one can show that £ G {2, . . . , £ ma x} with probability converging to 1. 
As in Lemma 2 of Chan and Walther (2011) one finds 

(14) * A W > C ^ Fn(/) 



#Xapp " (log 2 e/F n (/)) 8/5 
Standard considerations using Lemma Q] and (fT5T > show that the event jinfj^^ l\F n (I) > 
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F (I)J = 1 j has probability converging to 1, hence on this event 



inf x /2logLR n (F (I),F n (I)) > inf ^J±2LJM£L by © 

UAH} 



IeA{I) ^F (I)VF n (I) 

F r ,j(I) - F (I) F n (i)-F r ,i(i) 
> _ ml \Jn — ; — sup \Jn- 



I&A{I) ^F ryI (I)yF n {I) IEA(I) ^(/)VF n (/) 

-O p (l) by CD) 



ieA(l) 



F r,l(I) 



\/logn 



^ ^ (.-w.lfi-^)) 



> ,/21oe 



+ B n where B n := 6 n /9 + O p (l) 



F r>J (J)(l-.F P>/ (I)) 
and where the second to last inequality follows from LemmaQ] since 



1 fi,/(JAJ) F W (J) / 

F r ,i(J) F ri/ (J) F n (I)l + p VVbgr ; 



by CD) 



> 1- W2 1 + 



A/logn 



by the definition of .4.(1). 



Hence inf /e ^ (J) Li^F (/),F n (J)) > ^ ^^/.^ exp{i? n [bJ2+^2 log T ^ (7J ) } 
and so A c ™ d —> oo as in the proof of Theorem 3 in Chan and Walther (2011), using (fT5T ). Since 
the critical value of A% md stays bounded by Proposition 12 the claim follows. 

It remains to show 



(15) 



sup 

ieA(l) 



Fr,l{I) 



Fn(D 



o, 



^/logn 



Denote by Xt a \ the smallest and by Xty the largest observation in /. Writing d := b — a and 

Ui = F rJ (Xi): 



sup \fn 
ieA(l) 



F n (I) - F rJ (I) 



Fn{I) 



< 2 max \fn 

j=a,...,a+d 



o P (i) 



by well known facts. Together with F n (I) > -^p for I £ X app , this implies (Q3]>. 



□ 
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